Prosody and phonetic variability: Lessons learned from acoustic model clustering

نویسندگان

  • Izhak Shafran
  • Mari Ostendorf
  • Richard Wright
چکیده

Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled conversational speech data using acoustic parameters and clustering techniques that are standard in speech recognition. We find acoustic differences primarily associated with segment position at prosodic constituent onsets and at prominent syllables. Importantly, phones at fluent vs. disfluent boundaries are frequently placed in different clusters. These differences can be leveraged in a “multiple pronunciation” acoustic model to aid in detecting fluent vs. disfluent prosodic boundaries, and potentially for improving recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributional Learning of Vowel Categories Is Supported by Prosody in Infant-Directed Speech

Infants’ acquisition of phonetic categories involves a distributional learning mechanism that operates on acoustic dimensions of the input. However, natural infant-directed speech shows large degrees of phonetic variability, and the resulting overlap between categories suggests that category learning based on distributional clustering may not be feasible without constraints on the learning proc...

متن کامل

An Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model

This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...

متن کامل

Phonetic and speaker variations in automatic emotion classification

The speech signal contains information that characterises the speaker and the phonetic content, together with the emotion being expressed. This paper looks at the effect of this speakerand phoneme-specific information on speech-based automatic emotion classification. The performances of a classification system using established acoustic and prosodic features for different phonemes are compared,...

متن کامل

Acquisition of prosody: The role of variability*

Although some phonetic variability is inevitable in speech production, adult speech is fairly consistent. Thus, part of becoming a competent adult speaker is learning to appropriately limit the variability in one’s speech. It is generally believed that phonology is mastered relatively early; however, this does not take into account the refinement of articulation required to reign in the variabi...

متن کامل

Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition

A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001